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Abstract 

Given data stream D = {pi,p2, • ■ • ,Pm} of size m of numbers from 
{1, . . . , n}, the frequency of i is defined as fi = \{j : pj — i}\. The ki- 
th frequency moment of D is defined as F^ = Y^i=i fi- We consider the 
problem of approximating frequency moments in insertion-only streams for 
k > 3. For any constant c we show an 0(n 1_2 / fe log(n) log^ (n)) upper 
bound on the space complexity of the problem. Here log^(n) is the it- 
erative log function. To simplify the presentation, we make the following 
assumptions: n and m are polynomially far; approximation error e and pa- 
rameter k are constants. We observe a natural bijection between streams and 
special matrices. Our main technical contribution is a non-uniform sampling 
method on matrices. We call our method a pick-and-drop sampling; it sam- 
ples a heavy element (i.e., element i with frequency il(Fk)) with probability 
f2(l/n 1_2 / fe ) and gives approximation f > (1 — e)/i. In addition, the esti- 
mations never exceed the real values, that is fj < fj for all j. As a result, we 
reduce the space complexity of finding a heavy element to 0(n 1 ~~ 2 / k log(n)) 
bits. We apply our method of recursive sketches and resolve the problem with 
0(n 1 - 2 / k log(n) log (c) (n)) bits. 
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1 Introduction 



Given a sequence D = {pi,P2, ■ ■ ■ ,p m } of size m of numbers from {1, . . . , re}, a 
frequency of i is defined as 

fi = \{j ■ Vi = (D 
The k-th frequency moment of D is denned as 

n 

The problem of approximating frequency moments in one pass over D and 
using sublinear space has been introduced in the award-winning paper of Alon, 
Matias and Szegedy H). In particular, they observed a striking difference be- 
tween "small" and "large" values of k: it is possible to approximate F^k < 2 
in polylogarithmic space, but polynomial space is required when k > 2. Since 
1996, approximating has become one of the most inspiring problems in the the- 
ory of data streams. The incomplete list of papers on frequency moments include 

EOIHiaiHlSllMiniCEIlIIffl 

and references therein. We omit the detailed history of the problem and refer a 
reader to ll25l |29l for overviews. 

In this paper we consider the case when k > 3. In their breakthrough paper 
Indyk and Woodruff |fT9l gave the first solution that is optimal up to a polyloga- 
rithmic factor. Numerous improvements were proposed in the later years (see the 
references above) and the latest bounds are due to Andoni, Kraufhgamer and Onak 
and Ganguly 021. The latest bound by Ganguly 031 is 

0{k 2 e~ 2 n 1 - 2/k E{p, n) log(n) log(nmM) / min(log(n), e i/k ~ 2 )) 

where, E(k,n) = (l-2//c)- 1 (l-n- 4 ( 1 " 2 / fc ). This bound is roughly 0{n x - 2 l k log 2 (n)) 
for constant e, k. The best known lower bound for insertion-only streams is Q(n l ~ 2 l k ), 
due to Chakrabarti, Khot and Sun (H. 

We consider the problem of approximating frequency moments in insertion- 
only streams for k > 3. For any constant c we show an <3(n 1_2 / fc log(n) log^(n)) 
upper bound on the space complexity of the problem. Here log^ (re) is the iterative 
log function. To simplify the presentation, we make the following assumptions: re 
and m are polynomially far; approximation error e and parameter k are constants. 
We observe a natural bijection between streams and special matrices. Our main 
technical contribution is a non-uniform sampling method on matrices. We call 
our method a pick-and-drop sampling; it samples a heavy element (i.e., element 
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i with frequency Q(Fk)) with probability f}(l/n 1_2//fc ) and gives approximation 
fi > (1 — e)/j. In addition, the estimations never exceed the real values, that is 
fj < fj for all j. As a result, we reduce the space complexity of finding a heavy 
element to 0(n l ~ 2 l k log(n)) bits. We apply our method of recursive sketches [6] 
and resolve the problem with 0(n 1_2 / fc log(n) log^(n)) bits. We do not try to 
optimize the space complexity as a function of e. 



Overview of Main Ideas 

Pick-and-drop sampling has been inspired by a very natural behavior of children. 
We observed the following pattern: a child picks a toy, briefly plays with it, then 
drops the toy and picks a new one. This pattern is repeated until the child picks the 
favorite toy and keeps it for a long time. Indeed, children develop algorithms for 
selectivity GTTl . 

To illustrate the pick-and-drop method by example, assume that m = r * t 
where r = [n 1 /' ] and consider r x t matrix M with entries rriij = Pk^_i^ + j. For 
m < n we aim to solve the following promise problem with probability 2/3: 

• Case 1: all frequencies are either zero or one. 

• Case 2: z appears in every row of M exactly once (thus f z = r). All other 
frequencies are either zero or one. 

Consider the following sampling method. Pick r i.i.d. random numbers I\, . . . , I r , 
where Li is uniformly distributed on {1,2, ...,£}. For each i = 1 ... r — 1 we 
check if there is a duplicate of rrii j. in the row i + 1. If the duplicate is found then 
we output "Case 2" and stop; otherwise we repeat the test for i + 1. That is, the 
z-th sample is "dropped," and the (i + l)-th sample is "picked". We repeat this 
experiment T times independently and output "Case 1" if no duplicate is found. 
Note that if the input represents Case 1 then our method will always output "Case 
1." Consider Case 2 and observe that if rrii j. = z then our method will output 
"Case 2". Indeed, since z appears in every row, the duplicate of z will be found. 
The probability to miss z entirely is 

\\ rT 

Recall that m < n,m = rt,r = \n 1 ' fc ] . If T = 0(n 1-2 / fc ) with sufficiently large 
constant then the probability of error © is smaller than 1/3. We conclude that 
our promise problem can be resolved with 0{n l ~ 2 / k log(n)) space. Note how our 
solution depends on r. In general, the matrix should be carefully chosen. 
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Unfortunately the distribution of the frequent element in the stream can be 
arbitrary. Also our algorithm must recognize "noisy" frequencies that are large 
but negligible. Clearly, the sampling must be more intricate but, luckily, not by 
much. In particular, the following method works. We introduce a local counter 
for each sample that counts the number of times mj j. appears in the suffix of the 
i-th row (this counting method is used in HI for the entire stream). We maintain 
a global sample (and a global counter) as functions of the local samples and coun- 
ters. Initially the global sample is the local sample of the first row. Under certain 
conditions, the global sample can be "dropped." If this is the case then the local 
sample of the current row is "picked" and becomes the new global sample. The 
global sample is "dropped" when the local counter exceeds the global one. Also, 
the global sample is dropped if the global counter does not grow fast enough. We 
use function Xq where A is a parameter and q is the number of rows that the global 
counter survived. If the global counter is smaller than Xq then the global sample is 
"dropped." 

In our analysis we concentrate on the case when 1 is the heavy element, but 
it is possible to repeat our arguments for any i. Our main technical contribution 
is Theorem 12. 1 1 that claims that 1 will be outputted with probability for 
sufficiently large f\. Interestingly, Theorem 12.11 holds for arbitrary distributions 
of frequencies. In Theorem 13.61 we show that there exist r, t, X such that a bound 
similar to ([3]) holds. We combine our new method with [6] and obtain our main 
result in Theorem [3T8l 



2 Pick-and-Drop Sampling 

Let M be a matrix with r rows and t columns and with entries my G [n]. For 
% G [r], j G [t], I G [n] define: 

dij = \{f : j < j' < t, rriiji = mij}\, (4) 

fi,i = \{j€[t]:mij = l}\, (5) 
fl = \{(i,j):rn ilj = l}\, (6) 

n 

F k = Y J fi,G k = F k -fl (7) 
l=i 

Note that there is a bijection between r x t matrices M and streams D of size 
r x t with elements pu+j = rriij where the definitions ([2]), ([T]) and ([6]), ([7]) define 
equivalent frequency vectors for a matrix and the corresponding stream. W.l.o.g, 
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we will consider streams of size r x t for some r, t and will interchange the notions 
of a stream and its corresponding matrix. 

Let {Ij} r j = i be i.i.d. random variables with uniform distribution on [t]. Define for 
i = l,...,r: 

Si = m i j v c i = d i j i (8) 
Let A be a parameter. Define the following recurrent random variables: 

5i = «i,Ci = ci,gi = l. (9) 

Also (for i = 2, . . . r) if 

(Cj_i < max{A%_i,Cj}) (10) 

then define 

Si = Si,Ci = Ci,qi = 1; (11) 

otherwise, define 

Si = Sj_i, Cj = Cj_i + /g^j, % = <7j_i + 1 (12) 

Theorem 2.1. Lef M be a r x t matrix. There exist absolute constants a, /3 such 
that if 

a(Xr + ^ + ^)<f 1 <pt (13) 

then 

P(S r = 1 )>Yf (14) 

Proof. Denote Q = : mjj = 1}. For («, j) G Q define 



(.1,,.. //,,..//,,;. (15) 

where for i > 1: 

= ((Ci-i > dij) n (St-! + i)), (16) 

for i < r: 

Bi,j= ( (J E /i,«<Ch) J , (IV) 



r / h-\ 



Hij = U d ^ + E < C» - OA , (18) 

\h=i+l V u=i+l / / 
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and A\ 7 - = B r = ff r 7 - = 0. We have 



(( Si = l) n (^_i ^ l) n c (( s< = l) n (CU < <*)) c 

((Si = i) n ( g< = i)). (19) 

Consider the case when Si = 1 and ft = 1 and 



u=i+l 

for all h > i. In this case S7, will be defined by (fT2|) and not by ([TTj) : in particular, 
Sh = Si = 1. Therefore, 

r 

((Si = i) n (ft = i)n%;n%) c (p|(5 h = 1)). (20) 

h=i 

Define 14 = ((si = lJnTi.jJ and, for % >l,V l = (( Si = 1) / l)nT i)/4 ). 

If follows from ©, (gDJ) that, for any i E [r]: 

Vi C (S r = 1), (21) 

V- n V} = 0. (22) 

Thus, 

r 

P(Vi) = P (U r i=1 Vi) < P(S r = 1). (23) 

i=l 

For any i > 1: 

> = i) n Ti,/,) - P(si = = l). 

Also, 

r r 

P(si = Si-i = 1) < p ({si = 1) n (Uh^Ca/, = 1))) < 

i=2 j=2 

(E^,-))^(f) 2 . 

For any fixed € Q events I{ = j and Tij are independent. Indeed, Aij 

is defined by {<Si_i, Cj_i} that, in turn, is defined by . . . , 7j_i}. Similarly, 
Bij is defined by {h+i, ■ ■ ■ ,I r }- Note that H^j is a deterministic event. By 
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definition, . . . , Ij_i, ij+i, • • • , I r } are independent of Ii, thus event Jj = j 
and Ty = (^4j U By U By) are independent. Thus, 

r 

^% = i)nT iA )= ^ P((/.=j)nr lJ ) = 

»=i (*J)eQ 

2 P(/ i =j)P(r iii ) = 7 E p ( T «)- < 24 > 

( 

Thus, 



t 

(i,j)eQ (ij')eQ 



P(5 r = l) > i E P(T M ) - . 

(i,i)6Q V J 

Lemmal272l implies that E(y)eQ P ( T m) ^ °- 8 /i- Thus if I 3 < °- 3 men: 

P(5 r = l)>^(0.8-^)>|. 

Here we only use the second part of (fTHjh The first part is used in the proof of 
Lemma [272] □ 

Lemma 2.2. There exist absolute constants a, /3 such that (I13p implies 

E i'Cry) > O.8/1. 

(i,i)eQ 

It follows from Lemmas 12.91 12.171 12.141 and the union bound that there exists at 
least 0.97/i pairs G Q such that P(A i}j U By U By) < 0.02. Recall that 
Ty = (^4y U By U By); the lemma follows. 

2.1 Events of type A 

For € Q s.t. z > 1 and for I > 1 define: 

^(m) = 1 A l A{S,. 1 =l), 
= E 

je[t],(i,j)eQ 



i=2 
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Fact 2.3. Q < f ^ Also, if ft = 1 then Q < fs ui . 

Proof. Follows directly from (fTT|) . (fT2]) . It is sufficient to prove that, for any i, 
there exists a set Qi such that C,, = \Qi\ and simultaneously is a subset of 
: my j = Si,i' < i}. We prove the above claim by induction on i. For 
i = 1 the claim is true since we can define Q\ = : i > li}. For i > 2 

the description of the algorithm implies the following. If = 1 then we can put 
Qi = {(i, j) : j > Jj}. If % > 1 then define Qj = U {(*, j) : my = S*}. 
Note that in this case Si = Si-i. The second part follows from the description of 
the algorithms: if pi = 1 then C\ = Ci, Si = s, and c, = dij t (si) < f Si j. □ 

Fact 2.4. 

2. = 1 fAen Y tji < /^-i- 

Proof. Let (i, j) G Q be such that djj > /^ then: 



We use Fact 12. 3 1 for the last equality. Thus, Yi u j\ = 0. Definition of dy implies 
\{j ■ G Q, dij <//}|< fi for any fixed i and Z. Thus, 

j6[t],(i,i)eQ 

Part 2 following by repeating the above arguments and using the second statement 
of Fact E2 □ 



Definition 2.5. Let 1 < n < r 2 < r a«<i Z € [n]. CaZZ a pair [r\, r 2 ] an l-epoch if 

Mi = n, . . . ,r 2 : Si = l, 

and 

Qr 1 = Qr 2 +1 = 1) 

and 

Mi = n + 1, . . . ,r 2 : ft = %_i + l. 
Lemma 2.6. Le? [n, r2] Z>e an l-epoch. If r 2 > r\ then 

T2 — 1 

A 



1 

r 2 - 7i < - 5^ / M . 
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Proof. First, observe that q r2 ~i = r% — 7*1 . Second, % > 1 implies that 5j is 
defined by (fT2|) and not by (fTT]) for all ri < i < r2. In particular, C ri < fi ri and 
for r\ < i < T2 we have Q = Cj_i + /^j. Thus, 

r 2 — 1 
i=r\ 

Third, C r2 -i > Ag r2 -i since (fTUj) must be false for i = V2- Therefore, 



Lemma 2.7. Y| < + 

Proof. Observe that the set {i : 5, = 1} is a collection of disjoint /-epochs. Recall 
that Y\ = J2i=2 and Yii is non-zero only if Si-i is equal to L Thus we can 
rewrite Y\ as: 




T2 — 1 



□ 




For any epoch such that r2 > r\ we have by Lemmas |2.4| and [2T6l 





Since all epochs are disjoint we have 







□ 
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Lemma 2.8. P{Yi > 0) < 4 



Proof. Since are independent and < ^f- < 1 we can apply Fact l2.10l 



p(ru(mv t + o) = n(i - 4 1 ) ^ " T 

Thus, 



> 0) < P(uLiK, /4 = 0) < J- (25) 

□ 

Lemma 2.9. There exists an absolute constant a such that (|13p implies that P(A{j) < 
0.01 for at least 0.99/i pairs (i, j) € Q. 



Proof. (From Lemmas 12771 12. 8t 



1=2 

If follows that E(i,i)eQ = y - Reca11 that b Y (H3J): 

|0| = /i>a(^ + ^)>aE( £ l Ai)j ). 

Fact 12.111 implies that there exists an absolute constant a such that the lemma is 
true. □ 

The following fact is a well known. For completeness we present the proof. 

Fact 2.10. Let a±, . . . , a r be real numbers in [0, 1]. Then 



JI(l-a,)>l-(j>). 



i=l i=i 
Proof. If Y%=l a i ^ 1 then 



n(l-ai)>0>l-(j3ai) 

i=l i=l 
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Thus we can assume that Ya=i a i < We wm P rove tne claim by induction on r. 
For r = 2 we obtain (1 — ai)(l — 02) = (1 — ai — «2^ + aic^) > (1 — «i — 02)- 
For r > 2, we have, by induction, 

r ) — 1 r 

JJ(1 _ «.) >(!_(£ Qi ))(l - a,) > 1 - a,). 

j=i «=i «=i 

□ 

Fact 2.11. Let X±, . . . , X u be a sequence of indicator random variables. Let S = 
{i : P(X t = 1) < u}. //£(£" =1 Xi) < fiu then \S\ > (1 - $)u. 

Proof. Indeed, 

□ 

2.2 Events of type B 

For G Q let ^(jj) = 1^ r Let Z = Y^Uj)eQ We use arguments that 

are similar to the ones from the previous section. To stress the similarity we abuse 
the notation and denote by Yjw^fj the indicator of the event that h > i + 1, = I 
and 

( \ 

( + ■^ 1 ' u ) < C/l - 

\ u=i+l / 

Define Y ljh = £ ( i, i)eQ Y^^), ^ = EJUl 
Fact 2.12. Y, < /,. 

Proof. Repeating the arguments from Fact 12.41 we have Chl Sh =i < fi t h and thus 
V,./, < /,.,,. ^ ' □ 

Fact 2.13. P(Y t > 0) < f . 

Proof. The proof is identical to the proof of Lemma l2~8l □ 

Lemma 2.14. There exist absolute constants a, (5 such that (|13p implies that P(Bij) < 
0.01 for at least 0.99/i pairs € Q. 

Proof. Denote Y = £™ =1 *J- If follows that Z < Y and < £(Y). By Facts 

|2T3]and|2l2if foUows that E(Y t ) < Thus by $T5§: 

E(Z) < E(Y) <El = 9l + fl ll< {a + p )h . 

We repeat the arguments from Lemma [279] □ 
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2.3 Events of type H 

Definition 2.15. Let U = {u±, . . . , ut} and W = {w±, . . . , wt} be two sequences 
of non-negative integers. Let (i,j) be a pair such that 1 < i < t and 1 < j < U{. 
Denote as a loosing pair (w.r.t. sequences U, W) if there exists h,i < h < t 
such that: 

h 

-j + ^(Us - W s ) < 0. 

s=i 

Denote any pair that is not a loosing pair as a a winning pair. 

In this section we consider the following pair (U, W) of sequences. For i = 
1, . . . , r let Ui = f\ t i and Wi = A. 

Fact 2.16. If (i, j) is a winning pair w.r.t. (U, W) then H^ ji does not occur where 
j' is such that rriiji = 1 and diji = fij — j + 1. 

Proof. By Definition 12.151 for every % < h < r: 

h h 

- j + Y ui >Y wi - ( 26 ) 

l=i l=i 

SinceYli=i w i = (h— i+ 1)X and diji = fij— i+1 we have for every % < h < r: 

h h 

d hf + Y = ~ j + 1 + Y = 

l=i+l l=i+l 
h h h 

-j + 1 + Y ui - ~ j + Y ui - Y wi = ( h - i + 

l=i l=i l=i 

Substitute h by h — 1 (for h > i): 

h-l 

di >j' + Y di > i - ( h ~ i ) :K - 
i=-i+i 

Thus Hi ji does not occur, by ([18]) . □ 

Lemma 2.17. There exists an absolute constant a such that (|13j) implies that Hi j 
does not occur for at least 0.99/i pairs (i, j) € W. 



12 



Proof. By Lemma l2.20l fhere exist at least 



i=l 

winning pairs w.r.t. the (U,W). Also, YJi=i u i = Ta=i h,i = h and 
Yli=i w i = Thus there exist at least /i — Ar winning pairs w.r.t. the 
(U, W). In the statement of Fact 12.161 the mapping from j to f is a bijection; thus 
there exist at least f\ — Ar pairs (i,f) s.t. m$j/ = 1 and H^ji does not occur. By 
(fT3j) we have /i > aAr and the lemma follows. □ 

Definition 2.18. Let U = {u%, . . . , ut} and W = {w%, . . . , wt} be two sequences 
of non-negative integers. Let 1 < h < t. Let U' , W be two sequences of size t — h 
defined by p[ = Ui + h, q[ = Wi + hfor i = 1, . . . , t — h. Denote U', W as h-tail of 
the sequences U, W. 

Fact 2.19. If is a winning pair w.r.t. h-tail of U, W then (i + h,j) is a 
winning pair w.r.t. U, W. If (i, j) is a winning pair w.r.t. h-tail of U, W then 
is a winning pair w.r.t. U, W. 

Proof. Follows directly from Definitions |2~T51 and l2?T8l □ 

Lemma 2.20. //^^iK — w s ) > then there exist at least ^s=i( n « — w s) 
winning pairs. 

Proof. We use induction on t. For t = 1, any pair (1, j) is winning if 1 < j < 
u\ —w\. Consider t > 1 and apply the following case analysis. 

1. Assume that there exist 1 < h < t such that J2^=i( u s ~ w s) < 0. Con- 
sider the /i-tail of U, W. By induction and by Fact 12.191 there exist at least 

- w s) ^ ELi( u » - w s) winning pairs w.r.t. U, W. 

2. Assume that (1, Ui) is a winning pair; it follows that (l,j), j < u\ is a 
winning pair as well. If Y2 t s =2( u s ~ w s) > then, by induction and by Fact 
12.191 there exist at least X^s=2( u » — w s ) winning pairs of the form (i, j) 
where i > 1. In total there are u\ + X^ =2 (^s — w s ) > X^s=i(' u s — w s) 
winning pairs w.r.t. U, W. The case when X^s=2( n * — w s ) < is trivial. 

3. Assume that (1), (2) do not hold. Then u\ > 0. Indeed otherwise u\ — w\ < 
and thus (1) is true. Also (1, 1) is a winning pair. Indeed, otherwise there 
exists 1 < h < t such that — 1+X^=i( n i — w i) < 0- All numbers are integers 
thus Y2i=i( u i ~ w i) — an d (1) is true. Thus (1, 1) is a winning pair and 
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(1, ui) is not a winning pair (by (2)). Therefore there exist 1 < u < U\ such 
that (1, u— 1) is a winning pair and (1, u) is not a winning pair. In particular, 
there exists 1 < h < t such that 

h 

—u + ~^2(u s — w s ) < 0. 

s=l 

On the other hand (1, u — 1) is a winning pair thus 

h 

< 1 - u + ^(^s - w s ). 

s=l 

All numbers are integers and thus we conclude that 

h 

^2{u s - w s ) = u - l. 

s=l 

Consider the /i-tail of U, W. By induction, there exists at least 

t t 

y~l (ui - Wi) = yX u i - w i) -( u ~ l ) 

i=h+l i=l 

winning pairs w.r.t. the /i-tail of U, W. By Fact 12.191 there exist at least as 
many winning pairs w.r.t. U, W of the form (i, j) where i > 1. By properties 
of u there exist additional (u — 1) winning pairs of the form < u— 1. 

Summing up we obtain the fact. 

□ 



3 The Streaming Algorithm 

Fact 3.1. Let v\, . . . , v n be a sequence of non-negative numbers and let k > 2. 
Then 

Proof. Define X% = - . Since g(x) = x k 1 is convex on the interval [0, oo) 
we can apply Jensen's inequality and obtain: 

□ 
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Let Dbea stream. Define 



i-(i/k) r 1 / k 

71 U k ^ ^ = 2 ro.5 log 2 ^ 4 

^1 



i/fc 



A 



where we use (J2J) to define We will make the following assumptions: 
/i<0.LFi, t<F 1} F x ( modi) = 0. 



(27) 



(28) 



Then it is possible to define a matrix a r x i matrix M, where r = F\/t and with 



entries my = Pi r +j- 



Fact 3.2. 1 < 6 < 2n( fc - 1 )/ 2fc - 

Proof. Indeed, G\ < Gjj, n 1-1 /^ by Holder inequality and since f\ < O.lFi by 
we have ip > 0.5; thus, |"0.5 log 2 (V')l ^ and the lower bound follows. Also, 



Fl /k is the L k norm for the frequency vector since since all frequencies are non- 
negative. Since L k < L\ we conclude that ip < n 1 ^ 1 ^ and the fact follows. □ 

Observe that there exists a frequency vector with 5 = 0(1): put fj = 1 for all 
i € [n]. At the same time there exists a vector with 5 = f2(ra( fc_1 )/ 2fc ): put fi=n 
and fj = 1 for j > 2. It is not hard to see that if 5 is sufficiently large then a naive 
sampling method will find a heavy element. For example, in the latter case, the 
heavy element occupies half of the stream. 



Fact 3.3. Ar < 4Gj£ / * ; . 
Proof Recall that F x = 

Fact 3.4. 



ri. The fact follows from the definitions of A and t. □ 

G 2 



t 



< G 



l/k 



Proof. Define a 



Also, by FactO 



Thus, 



2 ^._ 3 2 ) ■ We have by Holder inequality: 



Go < G, 



G\- a 



Go<G 



I F 

G k — 



n 



«(i-f) 



G 



k-3 
k{k—2) 



k-3 

n 2* . 



k-l 1 j 

q 2(fc — 2) ^ Q2(k-T) 



(29) 



(30) 



fc-3 , _ 1 l 

fc ( fe - 2 > n i^G? (fc - 2) ^ 3 



n 



Fx 



1/2 



□ 
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Fact 3.5. < G^ k . 
Proof. By Holder inequality, 



G 3 <G 3 /V-( 3 / fc ). (31) 



Thus 

At - F 2 S 4 ~ k ■ 

□ 

Theorem 3.6. Let M be a r x t matrix such that (J27J) itf ?rwe. Then there exist 
absolute constants a, such that 

aGl /k <fi<Pt (32) 



imply 



P(S r = 1) > r-riwEv 03) 



2n l-(2/fe) ' 



Proo/ Bv ([MD and Facts 1331 K4[ 1331 



6a(Ar + ^ + ^)</ 1 </3t. 

Also, dZTJ) implies f x /t > x J (2/h) ■ Thus, ([33} follows from Theorem O □ 

Algorithm Q] describes our implementation of the pick-and-drop sampling. 

Theorem 3.7. Denote ff > 100 /f a heavy element. There exist a ( con- 
structive) algorithm that makes one pass over the stream and uses 0{n x ~ 2 l k log(n)) 
bits. The algorithm outputs a pair (i, fi) such that fi < fi with probability 1. If 
there exists a heavy element fi then also with constant probability the algorithm 
will output (i, fi) such that (1 — e)/j < fi. 

Proof. Define t as in ([27]) . W.l.o.g., we can assume that F\ is divisible by t. Note 
that if t > F\ or f\ > O.lFi then it is possible to find a heavy element with 
0{n 1 ~ 2 / k ) bits by existing methods such as Q. Otherwise, a stream D defines a 
matrix M for which we compute 0{n l ~ 2 l k / e5) independent pick-and-drop sam- 
ples. Since we do not know the value of 5 we should repeat the experiment for 
all possible values of 5. Output the element with the maximum frequency. With 
constant probability the output of the pick-and-drop sampling will include a (1 — e) 
approximation of the frequency fa. Thus, there will be no other fj that can give 
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Algorithm 1 P&D(M, r, t, A) 

Generate i.i.d. r.v. {J ? }J =1 with uniform distribution on [t]. 
Si = mi,^, 
Ci = di,h, 
qi = l. 

for i = 2 — > r do 

compute Si = ro^c* = 

if (Cj_i < max{Agj_i, Cj}) then 

Si — Sj, 

Cj — Cj, 

% = 1 

else 

^ = Si— i, 

Ci = Ci + fs it i, 

Qi = Qi-l + 1 
end if 
end for 

Output (S r ,C r ). 



a larger approximation and replace a heavy element. The total space will define 
geometric series that sums to 0(n 1_2 / fc log(n)). 

If we know F\ ahead of time then we can compute the value of t for any 
possible S and thus solve the problem in one pass. However, one can show that 
the well-known doubling technique (when we double our parameter t each time 
the size of the stream doubles) will work in our case and thus one pass is sufficient 
even without knowing F\ . □ 

Recall that in (6l we developed a method of recursive sketches with the following 
property: given an algorithm that finds a heavy element and uses memory fi(n), it 
is possible to solve the frequency moment problem in space 0(/j,(n) log^(n)). In 
[6] we applied recursive sketches with the method of Charikar et.al. [9]. Thus, we 
can replace the method from with Theorem 13 .7 1 and obtain: 

Theorem 3.8. Let e and k be constants. There exists a (constructive) algorithm that 
computes (1 ± e) -approximation of F^, uses 0(n 1-2 / fc log(n) log^(n)) memory 
bits, makes one pass and errs with probability at most 1/3. 
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